Large Scale Text Analysis
نویسندگان
چکیده
We take an algorithmic and computational approach to the problem of providing patent recommendations, developing a web interface that allows users to upload their draft patent and returns a list of ranked relevant patents in real time. We develop scalable, distributed algorithms based on optimization techniques and sparse machine learning, with a focus on both accuracy and speed.
منابع مشابه
Turning Quantitative: An Analytic Scale to Do Critical Discourse Analysis
Critical Discourse Analysis (CDA) could be seen as a theory in qualitative more than in qualitative stud- ies. This might have led to difficulty in doing CDA. Accordingly, this study attempted to develop a quan- titative profile in the form of an analytic rubric. For this purpose, Fairclough’s model of CDA was select- ed as the research framework. The techniques used for structuring analy...
متن کاملChapter 9 MINING TEXT STREAMS
The large amount of text data which are continuously produced over time in a variety of large scale applications such as social networks results in massive streams of data. Typically massive text streams are created by very large scale interactions of individuals, or by structured creations of particular kinds of content by dedicated organizations. An example in the latter category would be the...
متن کاملA comparative study of the text inside the Mihrabi rug by Zareh Penyamin and Topkapi Palace Museum according to the existing discourse in the 16th and 19th
IIn the country of Turkey, in the city of Hereke, at the end of the 19th century, rugs known as Mihrabi became popular, which were inspired by the rugs of the Safavid era and kept in the Topkapi Palace Museum. In these rugs, which are reproduced in royal workshops on a large scale, some changes have been made in the verbal text and incorporated visual elements. Among the rugs that seem to have ...
متن کاملLarge Scale Corpus Analysis and Recent Applications
Recent progress of corpus and machine learning-based natural language processing methodologies have made it possible to handle large scale corpus with a quite high accuracy. The speaker is now involved in a project for constructing a large scale contemporary Japanese balanced corpus, aiming at constructing automatic annotation tools on various levels of natural language analyses. I will first i...
متن کاملAutomating the analysis of collaborative discourse: identifying idea clusters
This poster explores CSCL practices relating to the use of a tool that employs information visualization techniques and large-scale text processing and analysis to complement qualitative analysis of collaborative discourse. Results from latent semantic analysis and qualitative analysis of online discussion transcripts are compared. Findings suggest that such tools that automate analyses of larg...
متن کاملStraightforward Feature Selection for Scalable Latent Semantic Indexing
Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large scale text corpus due to its high computational complexity. In this paper, we propose a straightforward feature selection strategy, which is named as Feature Selection for Latent Semantic Indexing (FSLSI), as a preprocess...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015